Bayesian Q-Learning

نویسندگان

  • Richard Dearden
  • Nir Friedman
  • Stuart J. Russell
چکیده

A central problem in learning in complex environments is balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information—the expected improvement in future decision quality that might arise from the information acquired by exploration. Estimating this quantity requires an assessment of the agent’s uncertainty about its current value estimates for states. In this paper, we adopt a Bayesian approach to maintaining this uncertain information. We extend Watkins’ Q-learning by maintaining and propagating probability distributions over the Q-values. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation. We establish the convergence properties of our algorithm and show experimentally that it can exhibit substantial improvements over other well-known model-free exploration strategies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Hypernetworks

We propose Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork, h, is a neural network which learns to transform a simple noise distribution, p( ) = N (0, I), to a distribution q(θ) . = q(h( )) over the parameters θ of another neural network (the "primary network"). We train q with variational inference, using an invertible h to ena...

متن کامل

Bayesian Deep Q-Learning via Continuous-Time Flows

Efficient exploration in reinforcement learning (RL) can be achieved by incorporating uncertainty into model predictions. Bayesian deep Q-learning provides a principle way for this by modeling Q-values as probability distributions. We propose an efficient algorithm for Bayesian deep Q-learning by posterior sampling actions in the Q-function via continuous-time flows (CTFs), achieving efficient ...

متن کامل

Bayesian Ying - Yang system , best harmony learning , and five action circling

Firstly proposed in 1995 and systematically developed in the past decade, Bayesian YingYang learning is a statistical approach for a two pathway featured intelligent system via two complementary Bayesian representations of a joint distribution on the external observation X and its inner representation R, which can be understood from a perspective of the ancient Ying-Yang philosophy. We have q(X...

متن کامل

Efficient Exploration through Bayesian Deep Q-Networks

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration in high dimensions through posterior sampling but is usually computationally expensive. We address this limitation by introducing uncertainty only at the output layer of the network through a Bayesian Linear Regression (BLR) mode...

متن کامل

Bayesian Q-learning with Assumed Density Filtering

While off-policy temporal difference methods have been broadly used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have been relatively understudied. This is mainly because the max operator in the Bellman optimality equation brings non-linearity and inconsistent distributions over value function. In this paper, we introduce a new Bayesia...

متن کامل

Uncertainty in action-value estimation affects both action choice and learning rate of the choice behaviors of rats

The estimation of reward outcomes for action candidates is essential for decision making. In this study, we examined whether and how the uncertainty in reward outcome estimation affects the action choice and learning rate. We designed a choice task in which rats selected either the left-poking or right-poking hole and received a reward of a food pellet stochastically. The reward probabilities o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998